This selfstudy assignment should be answered using a single Rmd file which can be run on one of the university servers js{01,02,03,04} in a resonable amount of time. Please add student name in the yaml header (if you are a group of students working together you are allowed to only hand in one answer with all your names on it).
To embed Rcpp code in your Rmd file use the chunk option engine='Rcpp' as below:
#include <Rcpp.h>
// [[Rcpp::export]]
int fibonacci(const int x) {
if (x == 0 || x == 1) return(x);
return (fibonacci(x - 1)) + fibonacci(x - 2);
}
fibonacci(4)
## [1] 3
To complete the exercises you need both to fill in a bit of LaTeX math formulas and some R and C++ code.
Consider the tail probability \(P(X>4)\) where \(X\sim N(0,1)\).
\(P(X>4) = E[1_{X>4}] = \int_{\mathbb{R}} f(x) 1_{x>4} dx\)
set.seed(123)
dat <- rnorm(100000)
indic_mean <- mean(dat>4);indic_mean # Expectation of indicator
## [1] 0.00001
sd(dat) # close to 1
## [1] 0.9997334
pnorm(4, lower.tail = F)
## [1] 0.00003167124
unif_data <- runif(10000, min = 0, max = 0.25)
unif_result <- 0.25 * 1/sqrt(2*pi) * exp(-1/(2*unif_data^2)) * 1/unif_data^2
mean(unif_result)
## [1] 0.00003274824
sd(unif_result)/sqrt(length(unif_result))
## [1] 0.0000009248579
Give a MC estimate and MC std. error for the fraction covered by the following discs:
set.seed(42)
n <- 1000
centers <- data.frame(x=runif(n), y=runif(n))
r <- .03
par(pty = "s")
plot(centers, pch = ".", axes = FALSE, xlab = "", ylab = "")
symbols(centers, circles = rep(r, n), inches = FALSE, bg = gray(.5, .5), add = TRUE)
rect(0,0,1,1, lwd = 3)
afstand_centers <- function(x) {
vec <- (x-centers)
output <- apply(vec, 1, norm,type="2")
return(output)
}
centers_mc <- data.frame(x=runif(100), y=runif(100))
mc_rslt <- apply(centers_mc, 1, afstand_centers)
mean(apply(mc_rslt, 2, min) < r)
## [1] 0.96
sd(apply(mc_rslt, 2, min)< r)/sqrt(100)
## [1] 0.01969464
If you implement this with a for loop I suggest trying to do it in C++.
Consider whether sorting the centers according to e.g. x-coordinate can be used to speed up calculations, and possibly implement this to see if it works.
You could gain speed by not checking the points that are too far away to have the necesarry distance. I.e. if the distance to the x coordinate alone is too much then there is no point calculating the distance.
Using the same idea as before but with only 1 center and the appropriate r, gives the relation of the area of the circle wrt. the square. Then \(\pi\) is isolated in the equation for the area of a circle, i.e \(\pi =\frac{A}{r^2}\).
We see two different ideas, a simple mean of the means, or a weighted mean of the means, weighted by \(n_i\). Same for standard error.
mclapply() (or another parallel processing function) to run 10 independent MC estimates of the integral in problem A and use the formula derived above to combine these to a single MC estimate and MC std. error.library(parallel)
set.seed(123)
indicator_mean_para <- function(n) {
data <- rnorm(n)
return(mean(data>4))
}
vec_para <- rep(10000, 10)
result <- mclapply(vec_para, indicator_mean_para);result
## [[1]]
## [1] 0
##
## [[2]]
## [1] 0
##
## [[3]]
## [1] 0.0001
##
## [[4]]
## [1] 0
##
## [[5]]
## [1] 0
##
## [[6]]
## [1] 0
##
## [[7]]
## [1] 0
##
## [[8]]
## [1] 0
##
## [[9]]
## [1] 0
##
## [[10]]
## [1] 0
mean(unlist(result))
## [1] 0.00001
sd(unlist(result))/sqrt(length(unlist(result)))
## [1] 0.00001
cl <- makeCluster(2)
para_rslt <- parSapply(cl, vec_para, indicator_mean_para)
para_rslt
## [1] 0.0000 0.0000 0.0000 0.0001 0.0000 0.0000 0.0000 0.0001 0.0000 0.0001
microbenchmark::microbenchmark(mclapply(vec_para, indicator_mean_para),
sapply(vec_para, indicator_mean_para),
parSapply(cl, vec_para, indicator_mean_para), times = 5)
## Unit: milliseconds
## expr min lq mean median
## mclapply(vec_para, indicator_mean_para) 6.2735 6.2830 6.39722 6.3388
## sapply(vec_para, indicator_mean_para) 6.3015 6.3114 6.41092 6.3854
## parSapply(cl, vec_para, indicator_mean_para) 4.3445 4.3458 4.40922 4.3590
## uq max neval
## 6.4532 6.6376 5
## 6.5060 6.5503 5
## 4.4079 4.5889 5
stopCluster(cl)
Consider a vector auto-regressive process of order 1, VAR(1) in \(\mathbb{R}^d\) with \(d\times d\) lag parameter matrix \(A\): \[x_t = Ax_{t-1} + e^t\] where \(e_t\) is a process of iid multivariate Gaussian variables \(e^t \sim N_d(\mu, \Sigma)\).
Given the starting state \(x_0 = 0\) we wish to simulate a realization of \(\{x_t\}_{t=1}^n\).
n multivariate Gaussian vectors with mean vector mu and covariance matrix Sigma (simply check that the empirical covariance matrix is close to the given one for a specific 2x2 case of your choice):mvrnormR <- function(mu, Sigma, n) {
d <- length(mu)
stopifnot(nrow(Sigma) == d)
std_norm <- matrix(rnorm(n*d), nrow = d, ncol = n)
mu + t(chol(Sigma)) %*% std_norm
}
mvnorm_data <- mvrnormR(c(1,2), matrix(c(1,0,0,4), nrow = 2), 1000)
var(t(mvnorm_data))
## [,1] [,2]
## [1,] 1.10560046 0.07200583
## [2,] 0.07200583 3.90770029
RcppArmadillo to implement this as a new function mvrnormcpp() and benchmark the two functions for the following input (check that they approximately have the same distribution – you can only use all.equal() to check if you have managed to use the same random numbers in both cases):d <- 2
Sigma <- matrix(c(2,1,1,4), nrow = 2, ncol = 2)
n <- 1000
set.seed(42)
mu <- c(10, 20)
#include <RcppArmadillo.h>
//[[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp ;
using namespace arma ;
// [[Rcpp::export]]
mat mvrnormRcpp(colvec mu, mat Sigma, int n) {
int d = mu.size();
mat std_norm = mat(d,n,fill::randn);
mat chol_sigma = chol(Sigma);
mat output_pre_mu = chol_sigma.t() * std_norm;
mat mu_reppet = mat(d,0);
for (int i = 0;i<n;i++){
mu_reppet.insert_cols(i, mu);
}
mat output = mu_reppet + output_pre_mu;
return output;
}
// [[Rcpp::export]]
mat simcpp(mat A, mat Sigma, int n) {
vec mu = vec(Sigma.n_rows);
mu.fill(0);
mat errors = mvrnormRcpp(mu, Sigma, n);
mat simdata = mat(mu.size(),n,fill::zeros);
for (int i = 1; i < n; i++){
simdata.col(i) = A * simdata.col(i-1) + errors.col(i);
}
return simdata;
}
test_1 <- mvrnormR(mu, Sigma, n)
test_2 <- mvrnormRcpp(mu, Sigma, n)
var(t(test_1))
## [,1] [,2]
## [1,] 1.9770650 0.9226675
## [2,] 0.9226675 3.8900227
var(t(test_2))
## [,1] [,2]
## [1,] 2.0305411 0.9583243
## [2,] 0.9583243 3.9712797
microbenchmark::microbenchmark(mvrnormR(mu, Sigma, n), mvrnormRcpp(mu, Sigma, n))
## Unit: microseconds
## expr min lq mean median uq max neval
## mvrnormR(mu, Sigma, n) 140.1 146.50 164.847 155.95 172.45 309.8 100
## mvrnormRcpp(mu, Sigma, n) 1487.8 1501.25 1556.854 1514.55 1561.30 3090.4 100
Simulation of VAR(1) is obtained with the R function simR() below. Implement this as a C++ function simcpp() which calls mvrnormcpp() you made before and benchmark the C++ and R implementations. We will use A <- matrix(c(0.7,0.2,0.2,0.7),2,2) as the coefficient matrix.
simR <- function(A, Sigma, n) {
mu <- rep(0, nrow(Sigma))
errors <- mvrnormR(mu, Sigma, n)
simdata <- matrix(0, length(mu), n)
for (col in 2:ncol(errors)) {
simdata[,col] = A %*% simdata[,(col-1)] + errors[,col]
}
return(simdata)
}
set.seed(42)
Sigma <- matrix(c(2,1,1,4), nrow = 2, ncol = 2)
n <- 100
A <- matrix(c(0.7,0.2,0.2,0.7),2,2)
# The results are not equal for the two methods, we have tried with different methods of setting the seed, however we can't figure out how to get them to be equal. We have however tried using matrices filled with 1's and using those we achieved the same paths for both the R and Rcpp implementation.
simR(A, Sigma, n)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 0 0.5135411 1.219350 3.230870 5.511006 6.370726 4.223284 3.931401
## [2,] 0 1.4407481 1.198552 1.974573 3.338289 8.639607 5.818232 6.012901
## [,9] [,10] [,11] [,12] [,13] [,14]
## [1,] 3.5525662 -0.999637 -0.867053 -1.413478499 1.6918786 0.8716697
## [2,] -0.1754595 1.331727 -2.817068 0.005526308 0.2559429 -2.9629573
## [,15] [,16] [,17] [,18] [,19] [,20] [,21]
## [1,] 0.6682533 0.5575372 1.8209593 1.906412 -0.3307621 -4.387902 -3.640855
## [2,] -2.7717190 -0.1658708 -0.4118704 -2.779289 -3.7108188 -4.303249 -4.419668
## [,22] [,23] [,24] [,25] [,26] [,27] [,28]
## [1,] -2.360328 -4.516346 -5.08525 -4.468328 -2.90061 -0.3881402 -0.2076790
## [2,] -4.645377 -3.881622 -1.49248 -1.140260 -2.93064 -0.3146082 0.2829957
## [,29] [,30] [,31] [,32] [,33] [,34] [,35]
## [1,] 0.8718833 -3.4615597 -3.111565 -1.594750 -1.830952 -0.2662401 1.856896
## [2,] 0.8049529 -0.8456234 -1.197388 1.569595 2.702327 3.7057863 4.540495
## [,36] [,37] [,38] [,39] [,40] [,41] [,42]
## [1,] 0.7327333 1.9233824 0.709481 1.892115 0.5987219 2.466156 2.121197
## [2,] 2.6434062 0.6539461 1.545546 2.634592 -0.4612025 1.349075 1.273943
## [,43] [,44] [,45] [,46] [,47] [,48] [,49]
## [1,] 0.05059063 0.05161596 1.485316 3.540473 3.848584 2.192579 0.3576843
## [2,] 1.61642316 0.64616679 2.660012 2.252607 5.347315 2.116995 -1.6097870
## [,50] [,51] [,52] [,53] [,54] [,55] [,56]
## [1,] 0.04153405 1.772150 0.4154262 0.3840584 0.3588356 0.9082054 0.9594242
## [2,] 0.22327560 2.968361 5.1811000 3.4357735 1.9543769 1.7958333 1.6231668
## [,57] [,58] [,59] [,60] [,61] [,62] [,63]
## [1,] 0.30972122 -2.124035 -2.571588 -3.1247771 -4.122628 -3.471977 -3.486885
## [2,] 0.04154319 -1.798831 3.008288 0.8851053 -3.812467 -5.269616 -5.185614
## [,64] [,65] [,66] [,67] [,68] [,69] [,70]
## [1,] -4.345805 -6.483931 -5.212753 -5.046056 -2.262383 -2.694786 -2.869817
## [2,] -8.549064 -7.383689 -6.986089 -3.832034 -4.726015 -1.595847 -2.086360
## [,71] [,72] [,73] [,74] [,75] [,76] [,77]
## [1,] -2.547918 -3.163621 -3.501495 -3.390920 -1.940044 -2.234268 -1.150330
## [2,] -3.755998 -3.508308 -1.298235 -2.759562 -4.093405 -6.184845 -4.462876
## [,78] [,79] [,80] [,81] [,82] [,83] [,84]
## [1,] -2.359439 -2.862875 -3.993602 -3.616492 -3.287245 -2.4145259 -3.2696352
## [2,] -6.001454 -6.175585 -2.863701 -4.932548 -4.739299 -0.8779393 -0.9488295
## [,85] [,86] [,87] [,88] [,89] [,90] [,91]
## [1,] -2.3584466 -1.892393 -3.547924 -0.7796689 -2.3343800 -0.7638265 -0.2170768
## [2,] 0.4173741 1.223166 2.404485 1.3029690 0.9345009 0.5170540 -0.7764998
## [,92] [,93] [,94] [,95] [,96] [,97] [,98]
## [1,] 0.2143178 -0.1998963 0.3516961 -0.9876319 -1.138103 -1.094697 -3.075420
## [2,] 0.2250683 -2.4969247 -0.2555154 -3.6830809 -3.276356 -4.763303 -2.200189
## [,99] [,100]
## [1,] -2.0218079 0.9973579
## [2,] -0.7724865 0.5794610
simcpp(A, Sigma, n)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 0 2.6077709 0.6071929 -1.1904342 -2.332671941 -1.8178693 -2.4595390
## [2,] 0 -0.8396652 -0.9726912 -0.8612278 -0.005374285 -0.8008187 0.1301978
## [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,] -1.853186 -3.083110 -2.525272 -3.116183 -0.8618157 -0.0627064 0.4307718
## [2,] -1.726499 -2.384791 -3.679333 -7.258178 -4.0451162 3.1160260 0.5382245
## [,15] [,16] [,17] [,18] [,19] [,20] [,21]
## [1,] -0.2248785 -1.990856 -1.586422 -1.8125637 -3.793190 -3.542885 -2.567572
## [2,] -2.5198800 -2.548637 -0.847225 0.6445216 -1.843362 -1.717154 -2.676938
## [,22] [,23] [,24] [,25] [,26] [,27] [,28]
## [1,] -4.190357 -2.517130 -3.464956 -5.318140 -4.846469 -5.692087 -5.775178
## [2,] -6.985798 -2.881131 -6.189841 -4.310736 -5.572288 -4.892774 -6.833105
## [,29] [,30] [,31] [,32] [,33] [,34] [,35]
## [1,] -5.831660 -6.034522 -6.052726 -6.414479 -7.674746 -5.116875 -3.936200
## [2,] -4.243981 -10.532306 -6.893222 -5.511279 -5.387241 -4.861653 -4.639084
## [,36] [,37] [,38] [,39] [,40] [,41] [,42]
## [1,] -3.113123 -1.327967 -1.589138 -0.6059357 -1.675905 -1.8223759 -1.020934
## [2,] -3.774464 -2.976614 -4.906706 -4.4884413 -3.856152 -0.4067211 -2.566372
## [,43] [,44] [,45] [,46] [,47] [,48] [,49]
## [1,] -0.6544794 -1.690895 -2.896684 -2.364319 -3.982514 -6.701732 -4.173244
## [2,] -2.1790918 -1.244442 -3.338028 -6.650214 -6.518828 -4.849553 -1.484200
## [,50] [,51] [,52] [,53] [,54] [,55] [,56]
## [1,] -3.444134 -2.708290 -1.193070 -1.449893 -1.879100 -1.677753 -2.060191
## [2,] -5.228902 -4.727018 -1.930378 -3.323071 -2.699356 -4.947111 -6.190022
## [,57] [,58] [,59] [,60] [,61] [,62] [,63]
## [1,] -2.154310 -0.8536856 -2.823040 -2.526831 -2.327402 -2.013383 -1.143463
## [2,] -1.598533 -3.1859946 -5.650762 -5.765853 -4.665237 -2.943199 -3.163323
## [,64] [,65] [,66] [,67] [,68] [,69] [,70]
## [1,] -3.287741 -2.122614 -2.388447 -1.103937 -3.082572 0.1650339 0.2385417
## [2,] -5.092242 -5.279899 -1.025130 -3.033470 -2.166680 1.6298850 1.5156479
## [,71] [,72] [,73] [,74] [,75] [,76] [,77]
## [1,] 2.398778 0.4573084 0.193190 0.3567415 3.160583 3.76160152 1.7352085
## [2,] 4.227851 2.2238328 4.663105 4.6584806 4.204081 0.08719731 0.2441115
## [,78] [,79] [,80] [,81] [,82] [,83] [,84]
## [1,] 1.241556 1.725473 3.509477 2.104847 -3.049875 -5.7207423 -1.928403
## [2,] -2.909087 -2.418882 -5.179856 -3.713294 -2.058053 0.8456378 -1.617221
## [,85] [,86] [,87] [,88] [,89] [,90] [,91]
## [1,] -1.978317 -1.997106 -3.6897672 -3.8721293 -3.9029739 -5.124172 -1.718046
## [2,] -1.970988 -5.689688 -0.3135198 -0.1950831 -0.6252521 -3.286679 -2.447850
## [,92] [,93] [,94] [,95] [,96] [,97] [,98]
## [1,] -2.740944 -3.959008 -3.177886 -3.860210 -2.0620192 -2.104946 -1.6046242
## [2,] -4.811102 -5.751575 -1.223788 -2.098131 -0.8564968 -2.596450 0.2239889
## [,99] [,100]
## [1,] -0.7923259 0.1611405
## [2,] 3.5891774 -1.8003140
microbenchmark::microbenchmark(simR(A, Sigma, n), simcpp(A, Sigma, n))
## Unit: microseconds
## expr min lq mean median uq max neval
## simR(A, Sigma, n) 154.7 162.4 178.534 167.75 176.70 635.4 100
## simcpp(A, Sigma, n) 40.1 42.9 59.278 45.00 49.55 1104.8 100